HELP
Richard Mansfield
Printing This Lesson
Select what you’d like to include when you print, and then click the Print Lesson button:
Text, images and activities (IE users only)
Text and images
Text only
Saving This Lesson
For instructions on saving this lesson (shown below), please select the browser you're using.
Introduction
Welcome to Lesson 8! Today, you'll learn how to ensure that an XML document is free of hidden errors.
Hidden errors are the worst kind of errors; because everything can seem to be working just fine, but then you get a call from a customer wanting to know why the piano arrived two months before Monica's birthday, and will you pay to store it?
Yikes. So how do you know if an XML file is error-free?
Well, the editor automatically tests any XML file you load into it to see if it's well-formed. Perhaps an end tag is missing, or misspelled. But this rudimentary error testing only detects a few kinds of obvious problems.
To ensure that your XML is also free of more significant structural or data errors, you need to go beyond just checking if the document is well-formed. You need to use a schema—a file that describes the correct structure and the types of data that the XML document must contain.
So here's what you're going to do in this lesson:
You'll also add a very useful feature to the cookbook program—a button that, when clicked, imports a recipe from any source and adds it to your recipes file.
Ready? Let's get started. I'll see you in Chapter 2!
Going Beyond Well-Formed XML
When you create or open an XML document in an XML editor, it automatically checks to see if the XML is well-formed. That means that the XML has these things right:
These simple rules are easy for an editor to check. For example, if you leave out an end tag for a <title> element in recipes.xml, the editor will alert you.
Take a look at this image. You can see the red sawtooth underlining that the editor displays to indicate a problem.
If you hover your insertion cursor over the red line in line 7, an error message pops up saying "Expecting end tag </title>"
But many XML mistakes aren't so easy to detect. XML can be perfectly well-formed and still contain serious errors. This is where schemas come in . . .
Enter Schemas
You can use a schema to ensure that your XML document is completely free of structural errors or ambiguous content. A schema file verifies that an XML document has these things:
Understanding Default, Fixed, and Empty Elements
XML includes three specialized kinds of elements: default, fixed, and empty. We won't be using them in this course, but let's take a quick look.
This means that Wisconsin is the only permitted content for this ourstate element. It can't be changed.
<product id="4434"/>
And here's how to describe an empty element in a schema:
Creating Automatic Schemas
Our recipes.xml data structure is pretty simple. But XML structures can be complex, and writing a schema by hand can be tedious and difficult. So you'll just want to use VS's built-in schema generator to create a schema file for you. Then, if some changes are necessary, you can edit it by hand.
Let's give it a try.
That was quick! A new tab appears, displaying the schema file, recipes.xsd (XML Schema Definition) that VS created for you:
As you can see, a schema is a kind of blueprint showing how an XML document is organized and what type of data it contains (in this case, text strings: xs:string).
All right, so let's validate the XML document to see what happens if the XML code conflicts with the schema. Follow these steps:
<xs:element name="title" type="xs:integer" />
This will generate an error, because our titles are always text, but we changed the schema to say that they should be numbers.
4. If your see a check in the Use column next to recipes.xsd, skip down to step 8 in this list. Otherwise, in the XML Schemas dialog box, click the Add button.
5. Double-click recipes.xsd in the C:\XML L8 Finished\Validator folder.
6. On the left side of the XML Schemas dialog box, click the down arrow in the Use column.
7. Choose Use This Schema for recipes.xsd.
8. Click OK to close the dialog box.
9. Choose View > Error List, and you'll see that the editor found three places in your XML that didn't agree with the schema (three text types that the schema says should be numbers):
How did you do? Think you got the hang of it? Take a look at this screen recording to watch me walk through those steps:
Our goal in this exercise is to see what happens when there's a conflict between a schema and the XML document that it supposedly describes. So here in the schema file, let's change the string to "integer" to describe the "title." We know this is wrong, and so this should generate an error for us when we actually do the validation.
Okay. Let's associate these two files. Go back to the recipes.xml and press Alt, Enter to reveal the Properties window. Here in the Schemas entry, you can click this button. And you'll see that, if you need to, you can click the Add button, but we've already got recipes.xsd right here, so we don't need to go down to the hard drive and locate it. Okay. Click over here and then choose "Use this schema" and click OK to close the dialog box. Now you have them associated.
Notice that here in the XML, the title elements are all underlined, and you get a message. Well, let's take a look here at the error list and see what we get. Warnings. We're told that this title element is invalid, and we're also told that it's not a valid integer, which it certainly isn't. So there you go, it works.
END TRANSCRIPT
Display All Files in Solution Explorer
By default, Solution Explorer doesn't list what it calls Miscellaneous Files. But I find it's sometimes useful to see all the files in a project. So to make them all visible, choose Tools > Options > Documents, and then click Show Miscellaneous files in Solution Explorer. Click the OK button to close the dialog box. Now you should see your XML and XSD files listed in the Solution Explorer window.
Great work! Now, meet me in the next chapter, where we'll tackle "bad" data.
Avoiding Bad Data
As you know, XML is an excellent way to store data. But it's also a great way to communicate it. Organizations often use XML to send data back and forth, and they include schema files to prevent misunderstandings.
Perhaps even more important than looking for structural errors, remember that a schema can also let you know if your XML content is wrong.
Consider this mix-up: Your company sent a boatload of armchairs from North Carolina to Quebec two months late, and as a result they couldn't fulfill hundreds of holiday gift orders. Their customers are angry, and the company is beyond angry.
What went wrong? Your order fulfillment software misinterpreted their 2016-12-10 Québécois date format. In the United States, 12-10 means December 10, but in Quebec, it's October 12.
Whoops! To prevent problems like this next time, you'll want to ask your customers to send a schema along with their XML order document. The schema will contain a date pattern that can't be misunderstood. Perhaps it'll be the official XML date pattern: YYYY-MM-DD—where 2016-10-12 has to mean October 12.
<shipdate type="date">2016-10-12</shipdate>
There are many official XML data types and subtypes you can use with schemas. Here are some of the most common:
But in addition to the major data types, there are many derived types (or subtypes), such as the token—a string with no whitespace, or the positiveInteger. And you can further refine types by using constraints—for example with the decimal type, you can specify an exact number of decimal places.
Just Google "xml schema data types" for a complete list. But always include data type specifications in your schema whenever you need to avoid ambiguity.
Let's Chat!
Date mix-ups are notorious, because different countries—even people in the same country—format dates differently. Can you think of another danger zone, another type of information that might cause problems because people express it in various ways? Join your classmates in the Discussion Area to share your ideas.
Creating a VB Validator
Earlier in this lesson, you validated the recipes.xml file using the validation tool built into VS. Now let's see how to write a VB program that validates. You'll import VB's XML.Schema library and use a couple of its commands.
Not so fast, I can hear you saying. What's the point of writing this VB program? Why not just use the built-in VS Schema validating feature like we just did in this lesson?
Good question.
XML Challenge!
Can you think of a reason why using a VB validating program would be better than just using the validator built into VS?
Three reasons:
The C:\XML L8 Finished folder includes a VB Validator program. You might want to view it now in VS to test the code or customize the program. To see the Validator, double-click the Validator.sln file your C:\XML L8 Finished\Validator folder. It will load into the VB editor.
Okay, let's look at the code that makes the validator do its job.
Looking at the Code
strMessage += e.Message & vbCrLf & vbCrLf
The += means append e.Message to strMessage. You can accomplish the same thing by using this alternative code line:
strMessage = strMessage & e.Message & vbCrLf & vbCrLf
+= is just a shortcut way of appending new text to a text variable. The vbCrLf symbol is like pressing ENTER—it moves the text down one line to separate each error with some whitespace.
The End Command
If you're wondering about the End command in line 23, it's not essential. It just prevents VB from displaying Form1. And we're using a message box in this program rather than the form to communicate with the user, so the form can remain hidden.
Testing the Validator
To test the validator, change the title data type to xs:integer in the .xsd file. Recall that this will trigger an error because the title elements are text, not numeric digits:
Then, after saving the recipes.xsd file with this change, press F5 to run the validation. You should see the message box shown here:
Importing a Recipe From the Clipboard
What's the easiest way for the user to add new recipes to their cookbook? Import them. It takes only three steps:
The program automatically adds all the necessary XML tags and saves the recipe into the recipes.xml file.
You can import recipes this way from the Internet, from Notepad, or from anywhere else you can select a recipe. Just one word of warning: The cookbook program distinguishes a recipe's title from its instructions by a carriage return (pressing the ENTER key to move to a new line). So you should have at least two paragraphs in any recipe you select—the first paragraph is the title, followed by as many additional paragraphs as you want for the instructions. The cookbook program will warn you if you attempt to import a single-paragraph recipe.
If a recipe's instructions are super long and overflow the textbox, the user can press the down-arrow or up-arrow keys to scroll the text. Alternatively, you can add a scrollbar. Just click the textbox in the Design window to select it, and then click the down arrow next to ScrollBars in the Properties window, and choose the Vertical scrollbar.
Okay, now we'll examine the code I wrote to import recipes:
Exploring the Code
Follow these steps:
Let's look at the code this sub uses to import from the clipboard—and we'll discuss how it works.
Dim WholeRecipe As StringDim strTitle As StringDim strInstructions As String
Dim intPointer As Integer
On Error GoTo PROBLEM
WholeRecipe = Clipboard.GetText
These two lines delete any carriage returns, line feeds, tabs, or other such nonalphabetic formatting. They stop when they reach normal alphabetic or digit characters. This leaves your recipe clean, containing only ordinary text. If you don't remove this formatting stuff, your title might be simply a tab character rather than text. Recipes that the user copies from the Internet are especially likely to contain all sorts of debris.
intPointer = WholeRecipe.IndexOf(ControlChars.CrLf, 0)
Then, you put the text characters to the left of the CrLf (characters from zero to the pointer) into the strTitle variable. Finally, you put all the characters from the right of the pointer (plus 1 because you don't want to include the CrLf character) into the strInstructions variable.
strTitle = WholeRecipe.Substring(0, intPointer) strTitle = strTitle.Replace("'", "") strInstructions = WholeRecipe.Substring(intPointer + 1)
Notice the Replace command in the code above. This strips off any apostrophes in the title. If a recipe's title contains an apostrophe, like Uncle Bob's BBQ Sauce, the recipe will import, display correctly in the listbox, store in the XML file. But if you ever click on that title in the listbox, the program will crash and display an "invalid token" error message. Xpath, unfortunately, wasn't designed to handle apostrophes (or for that matter, quotation marks either).
The problem is that XPath itself uses apostrophes for its own purposes: to indicate the end of a text string. As a result, the following code in the lstTitles_Click event chokes on any apostrophes within title text:
IndividualRecipe = doc.SelectSingleNode("descendant::recipe[title='" & lstTitles.SelectedItem & "']")
My workaround is to just strip apostrophes from titles. (You can use them in the instructions with no problem.) If you really want to use apostrophes in your titles, feel free to look for a solution. You'll find a couple of suggestions in the Supplementary Material section of this lesson.
strInstructions = strInstructions & vbCrLf & vbTab & vbTab & Now.ToString("MMMM, yyyy")
The Now command provides the current date. VB includes many formats you can use for dates and time. For example, if you change yyyy to yy, you'll display May, 14 rather than May, 2014. Just Google "VB date format" to see all the variations.
lstTitles.Items.Add(strTitle) txtInstructions.Text = strInstructions
lstTitles.SelectedIndex = lstTitles.FindString(strTitle)
Dim IndividualRecipe As XmlElement = doc.CreateElement("recipe")
Dim title As XmlElement = doc.CreateElement("title")
Dim instructions As XmlElement = doc.CreateElement("instructions")
title.InnerText = strTitle
Copy the imported instructions text into the new < instructions> element's InnerText
instructions.InnerText = strInstructions
IndividualRecipe.AppendChild(title)
IndividualRecipe.AppendChild(instructions)
doc.DocumentElement.AppendChild(IndividualRecipe)
doc.Save(RecipesFilePath)
Exit Sub PROBLEM: MsgBox("This recipe cannot be imported. Either there is an image in the clipboard instead of text. Or the text is a single paragraph. If so, press the Enter key after the recipe title. Then try importing again.")
If something goes wrong, it's far better to display a message to the user than to have your program abruptly stop with no explanation. User's justifiably hate that. Instead, describe the problem and tell them the solution.
It's necessary to put the Exit Sub command above the error handler label PROBLEM. If you don't do that, the program will display this error message every time the Import event executes—whether or not there was a problem.
By adding the Exit Sub command, the code executes normally and then quits (exits) without displaying this message to the user. Only if VB runs into a problem does execution jump over all the code (not attempting to import) and land here at the PROBLEM label where it tells the reader why the importing failed.
Recall that early on in this event, we told VB "if something goes wrong, stop executing any further instructions and go directly to the PROBLEM: label."
Using the Dreaded GoTo Command
We're playing with fire in this lesson's code by using the notorious and widely condemned GoTo command. In the early days of personal computer programming (1970-1980), programmers badly misused GoTo. People wrote code that jumped around here and there with wild abandon. It was so hard to read, maintain, and follow a path of execution so complicated that people called it spaghetti code. Instead of stepping down from line to line in order, execution would leap here, then there—all over the place.
Teachers rightly warned their students to avoid GoTo, advising that they should create a new sub any time they're tempted to use GoTo.
GoTo has survived for use in this one, specialized error-handling technique (On Error GoTo) that we're using in our code. Even so, some experts still prefer that you use centralized error-handling (creating a sub that handles errors for the whole program).
All right, it's time to put this importing into practice.
Importing Recipes
To practice using the cookbook program's Import feature, follow these steps:
http://richardm52.wordpress.com/
That's all there is to it! You've added a new recipe to the recipes.xml file. And whenever you want to add a new recipe after typing it into Notepad or Word or some other editor—just follow the same select-copy-click steps.
Nothing could be easier than importing recipes into the cookbook. Just locate something, for example, on the Internet that you want to copy, drag your mouse to select it, and then press Control, C, or you can right-click the selection and choose Copy. Then, switch to your Cookbook program. We'll press F5 to start it running here. And then click the Import button and there you are: Same-Day Sweet Pickles, the recipe we just imported.
Summary
Today you learned how to ensure the integrity of XML documents by enforcing rules listed in a schema. Called validating, this process prevents not only structural corruption but also insidious data ambiguity. You found out how easy it is to create schema files by letting the VS editor do the heavy lifting, and you practiced validating the recipes.xml file two ways:
Then you added a significant new feature to the cookbook program: the ability to add recipes to the recipes.xml file by importing them from the Internet, Notepad, or any other digital source.
Try out this crossword before you take the quiz for this lesson:
Crossword Game
The next lesson is all about graphics—an area of computing that programmers and computer experts often overlook!
Here you'll find Microsoft's solid tutorial on the virtues of schemas.
Here's a good site if you'd like to find some example code for something you're working on.
This site has very active and quick feedback.
This site often has good answers, but it's usually technical and advanced.
Check out this discussion about the XPath apostrophe problem. If you're patient, you might figure out a way to handle apostrophes in our titles. Let us know in the Discussion Area if you succeed!
Q: Can I specify inside an XML file the schema that validates it?
A: Sure. It's common to link its XSD File to an XML file. As you saw in this lesson, VS doesn't automatically check a schema file if you simply load an XML file. How would it know where the schema file is located? To make VS (or any other program that can analyze and validate XML) use a schema, just reference the schema inside the XML.
For example, add this highlighted code as an attribute of the root element in the recipes.xml file:
The SchemaLocation here doesn't provide a filepath—merely the name of the schema file: recipes.xsd. This is the simplest way to specify the location of a schema. You don't need to provide the filepath, because you stored the XSD file in the same folder as the recipes.xml file it works with.
But if the XSD file is located somewhere else on your hard drive, you would then have to specify a full filepath, like this:
Notice all those extra slashes that you must use. A normal Windows file path looks like this with only single slashes:
C:/XML L8 Finished/Cookbook/recipes.xsd"
And if you're sending XML files to other companies, you have to add your company's Web address to the XSD file location, like this:
xsi:schemaLocation="http://www.YourCompanyName.com/OurSchemas/recipes.xsd">
Q: What should I do if I can't find good example code on the Internet?
A: You'll usually find what you're looking for, but you can also visit a forum and post a question. Check out the Supplementary Material section for some helpful fora. But if you're searching for a technique too new or too obscure, you might have to resort to persistence. In other words, keep digging until you find your answer (or you realize there is no answer).
For example, I struggled for 2 hours to solve a problem in this lesson's cookbook code (removing carriage returns—whitespace—from the start of a recipe before importing it). Here's the solution I eventually found:
WholeRecipe = WholeRecipe.TrimStart(Nothing)
The example code I found on the Internet wasn't much use, and I assumed that asking on a forum would probably result in similarly unhelpful answers. I already knew a couple of ways to remove unwanted characters from text, but those methods were cumbersome—involving loops, character arrays, and other complications. And those same techniques were all I found in the online code examples.
When I Googled for things like "vb remove carriage returns," I got lots of complicated suggestions. Some people even said you have to use loops, that there's no other way. I tested several versions myself, and couldn't find anything straightforward and simple.
But I kept going. There must be an efficient, single-command solution, I said. (I didn't say this out loud, of course. Some of your friends and relatives are likely to think that programming makes you a mad scientist. No use adding fuel to the fire by mumbling to yourself!)
Do I have to use multiple lines of code to solve this problem? I knew that Microsoft was always adding new string-manipulation commands. There must be some straightforward way to strip off nonalphabetic characters from the start of a string. It's not that rare a task.
Finally, I came upon the TrimEnd and TrimStart commands. They're so obscure that they get little mention on the Internet, they're entirely missing from the VB editor's intellisense code-completion tools, and they're also left out of most of the string-related Microsoft online Help documentation (see the VB String Manipulation Summary for VS 2013 where you won't find them).
But these two commands work beautifully. You can list individual characters you want to remove, but if you specify Nothing as I did in the Import event in this lesson, these Trim commands strip off all nonalphabetic characters. Perfect.
In this lesson, we appended the date to any new recipe imported into the cookbook. But people have different tastes: Maybe you'd rather prepend it. Or maybe you want it flush left (by omitting the tabs).
Recall that in the code I wrote for the Import feature, I used a carriage return and a couple of tabs to position the date at the bottom of the instructions on a line by itself and two tabs over from the left. Here's my code:
See if you can modify this line of code to put the date on a line by itself at the start of the instructions.
strInstructions = Now.ToString("MMMM, yyyy") & vbCrLf & strInstructions
Or if you do want to include the tabs:
strInstructions = vbTab & vbTab & Now.ToString("MMMM, yyyy") & vbCrLf & strInstructions
Back to top